<p>
  In order to apply the model, we need to first pull history data to build it. The project can be briefly divided into four parts: the historical data request, model training, prediction and execution.
</p>

<h3>Step 1: Request Historical Data</h3>
<p>
  The first function takes two arguments: symbol and number of daily data points requested. This function requests historical <a href="https://www.quantconnect.com/docs#Consolidating-Data-TradeBars-vs-QuoteBars">QuoteBars</a> and builds it into a pandas DataFrame. For more information about pandas DataFrame, please refer to the help documentation <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html">DataFrame</a>. The calculate_return function takes a DataFrame as an argument to calculate the mean and standard deviation of the log prices, and create new columns for the DataFrame (return, reversal factor and momentum) - it prepares the DataFrame for multiple linear regression.
</p>
<div class="section-example-container">

<pre class="python">def get_history(self,symbol, num):
    data = {}
    dates = []
    history = self.History([symbol], num, Resolution.Daily).loc[symbol]['close'] #request the historical data for a single symbol
    for time in history.index:
        t = time.to_pydatetime().date()
        dates.append(t)
    dates = pd.to_datetime(dates)
    df = pd.DataFrame(history)
    df.reset_index(drop=True)
    df.index = dates
    df.columns = ['price']
    return df

def calculate_return(self,df):
    #calculate the mean for further use
    mean = np.mean(df.price)
    # cauculate the standard deviation
    sd = np.std(df.price)
    # pandas method to take the last datapoint of each month.
    df = df.resample('BM',how = lambda x: x[-1])
    # the following three lines are for further experiment purpose
    # df['j1'] = df.price.shift(1) - df.price.shift(2)
    # df['j2'] = df.price.shift(2) - df.price.shift(3)
    # df['j3'] = df.price.shift(3) - df.price.shift(4)
    # take the return as depend variable
    df['log_return'] = df.price - df.price.shift(1)
    # calculate the reversal factor
    df['reversal'] = (df.price.shift(1) - mean)/sd
    # calculate the momentum factor
    df['mom'] = df.price.shift(1) - df.price.shift(4)
    df = df.dropna() #remove nan value
    return (df,mean,sd)</pre>
</div>

<h3>Step 2: Build Predictive Model</h3>
<p>
  The concat function requests history and joins the results into a single DataFrame. As \(\mu \) varies by country so we assign the mean and standard deviation to the symbol for each currency for future use. The OLS function takes the resulting DataFrame to conduct an OLS regression. We write it into a function because it's easier to change the formula here if we need.
</p>

<div class="section-example-container">

<pre class="python">def concat(self):
    # we requested as many daily tradebars as we can
    his = self.get_history(self.quoted[0].Value,20*365)
    # get the clean DataFrame for linear regression
    his = self.calculate_return(his)
    # add property to the symbol object for further use.
    self.quoted[0].mean = his[1]
    self.quoted[0].sd = his[2]
    df = his[0]
    # repeat the above procedure for each symbols, and concat the dataframes
    for i in range(1,len(self.quoted)):
        his = self.get_history(self.quoted[i].Value,20*365)
        his = self.calculate_return(his)
        self.quoted[i].mean = his[1]
        self.quoted[i].sd = his[2]
        df = pd.concat([df,his[0]])
    df = df.sort_index()
    # remove outliers that outside the 99.9% confidence interval
    df = df[df.apply(lambda x: np.abs(x - x.mean()) / x.std() &lt; 3).all(axis=1)]
    return df

def OLS(self,df):
    res = sm.ols(formula = 'return ~ reversal + mom',data = df).fit()
    return res</pre>
</div>

<h3>Step 3: Apply Predictive Model</h3>
<p>
  The predict function uses the history for the last 3 months, merges it into a DataFrame and then calculates the updated factors. Using these updated factors (together with the model we built) we calculate the expected return.
</p>
<div class="section-example-container">

<pre class="python">def predict(self,symbol):
    # get current month in string
    month = str(self.Time).split(' ')[0][5:7]
    # request the data in the last three months
    res = self.get_history(symbol.Value,33*3)
    # pandas method to take the last datapoint of each month
    res = res.resample('BM',how = lambda x: x[-1])
    # remove the data points in the current month
    res = res[res.index.month != int(month)]
    # calculate the variables
    res = self.calculate_input(res,symbol.mean,symbol.sd)
    res = res.ix[0]
    # take the coefficient. The first one will not be used for sum-product because it's the intercept
    params = self.formula.params[1:]
    # calculate the expected return
    re = sum([a*b for a,b in zip(res[1:],params)]) + self.formula.params[0]
    return re

def calculate_input(self, df, mean, sd):
    df['reversal'] = (df.price - mean)/sd
    df['mom'] = df.price - df.price.shift(3)
    df = df.dropna()
    return df
  </pre>
  </div>

<p>
  There are a few points of note:
</p>

<ul>
 	<li>We need historical TradeBars for the last three months. To do this we requested 99 bars and use a pandas DataFrame to extract a data point for the end of each month.</li>
 	<li>We use event schedule to execute the strategy at the first trading day, however, sometimes the first day of the month could be on the 2nd if the 1st falls on a weekend. To fix this we remove the data from the current month, leaving only the last 3 months of data.</li>
 	<li>We start from the second element of res (res[1:]) because res and params are different lengths. This was hard to detect because Python would not throw error when running [a*b for a,b in zip(res,params)] even if the length of the two lists are different.</li>
 	<li>This function also used pandas DataFrame methods extensively. For more information please refer to <a href="http://pandas.pydata.org">pandas</a>.</li>
</ul>

<h3>Step 4: Initializing the Model</h3>
<p>
  In the Initialize function we prepare the data and conduct a linear regression. The class property 'self.formula' is the result of the OLS regression. We will use this object each time we rebalance the portfolio.
</p>

<div class="section-example-container">

<pre class="python">def Initialize(self):
    self.SetStartDate(2013,6,1)
    self.SetEndDate(2016,6,1)
    self.SetCash(10000)
    self.syls = ['EURUSD','GBPUSD','USDCAD','USDJPY']
    self.quoted = []
    for i in range(len(self.syls)):
        self.quoted.append(self.AddForex(self.syls[i],Resolution.Daily,Market.Oanda).Symbol)
    df = self.concat()
    self.Log(str(df))
    self.formula = self.OLS(df)
    self.Log(str(self.formula.summary()))
    self.Log(str(df))
    self.Log(str(df.describe()))
    for i in self.quoted:
        self.Log(str(i.mean) + '   ' + str(i.sd))
    self.Schedule.On(self.DateRules.MonthStart(), self.TimeRules.At(9,31), Action(self.action))
  </pre>
</div>

<h3>Step 5: Performing Monthly Rebalancing</h3>
<p>
  Every month we rebalance the portfolio using the <a href="https://www.quantconnect.com/docs#Scheduled-Events">Schedule Event</a> helper method. The predicted returns are added to the rank array and then sorted by return. The first element in the list is the best return paired with the associated symbol. When all the expected returns in the rank array are positive we only go long the pair with the highest expected return. When all returns are negative, we only go short the pair with the lowest expected return.
</p>

<div class="section-example-container">

<pre class="python">def action(self):
    rank = []
    long_short = []
    for i in self.quoted:
        rank.append((i,self.predict(i)))
# rank the symbols by their expected return
    rank.sort(key = lambda x: x[1],reverse = True)
# the first element in long_short is the one with the highest expected return, which we are going to long, and the second one is going to be shorted.
    long_short.append(rank[0])
    long_short.append(rank[-1])
    self.Liquidate()

# the product &lt; 0 means the expected return of the first one is positive and that of the second one is negative--we are going to long and short.
    if long_short[0][1]*long_short[1][1] &lt; 0:
        self.SetHoldings(long_short[0][0],1)
        self.SetHoldings(long_short[1][0],-1)
        # this means we long only because all of the expected return is positive
    elif long_short[0][1] &gt; 0 and long_short[1][1] &gt; 0:
        self.SetHoldings(long_short[0][0],1)
# short only
    else:
        self.SetHoldings(long_short[1][0],-1)
      </pre>
</div>
